Optimal Economic Design through Deep Learning (Short paper)∗

نویسندگان

  • Paul Dütting
  • Zhe Feng
  • Harikrishna Narasimhan
  • David C. Parkes
چکیده

Designing an auction that maximizes expected revenue is an intricate task. Despite major efforts, only the single-item case is fully understood. We explore the use of tools from deep learning on this topic. The design objective is revenue optimal, dominant-strategy incentive compatible auctions. For a baseline, we show that multi-layer neural networks can learn almost-optimal auctions for a variety of settings for which there are analytical solutions, and even without encoding characterization results into the design of the network. Our research also demonstrates the potential that deep nets have for deriving auctions with high revenue for poorly understood problems. A fundamental result in auction theory is the characterization of revenue optimal auctions as virtual value maximizers [21]. We know, for example, that second price auctions with a suitably chosen reserve price are optimal when selling to bidders with i.i.d. values, and how to prioritize one bidder over another in settings with bidder asymmetry. Myerson’s theory is as rare as it is beautiful. In a single item auction, a bidder’s type is a single number (her value for the item), making this a single-dimensional mechanism design problem. The design of optimal auctions for multiple items has proved much more difficult, and defied a thorough theoretical understanding. Tracing the contours of analytical results reveals the difficulty of this problem of multi-dimensional mechanism design. Decades after Myerson’s result, we do not have precise descriptions of optimal auctions with two or more bidders and more than two items. Even the design of the optimal auction for selling two items to a single buyer is not fully understood.1 For a single additive buyer with item values i.i.d. U(0, 1), Manelli and Vincent [20] handle two items, and Giannakopoulos and Koutsoupias [14] up to six items. Yao [27] provides the optimal design for any number of additive bidders and two items, as long as item values can take on one of two possible values. A promising alternative is to use computers to solve problems of optimal economic design. The framework of automated mechanism design [8] suggests to use algorithms for the design of optimal mechanisms. Early approaches required an explicit representation of all possible type profiles, which is exponential in the number of agents and does not scale. Others have proposed to search through a parametric subfamily of mechanisms, and are not fully general [17, 18, 25, 22]. In recent years, efficient algorithms have been developed for the design of optimal, Bayesian incentive compatible (BIC) auctions in multi-bidder, multi-item settings [2, 5, 1, 3, 6, 4, 9]. But despite this, many ∗Full version of the paper: Dütting et al. (2017): “Optimal auctions through deep learning,” CoRR, abs/1706.03459. †Department of Mathematics, London School of Economics, Houghton Street, London WC2A 2AE, UK. Email: [email protected]. ‡John A. Paulson School of Engineering and Applied Sciences, Harvard University, 33 Oxford Street, Cambridge, MA 02138, USA. Email: {zhe_feng,hnarasimhan,parkes}@g.harvard.edu. Results are known for additive i.i.d. U(0, 1) values on items [20], additive, independent and asymmetric distributions on item values [9, 15, 26], additive, i.i.d. exponentially distributed item values [9] and extended to multiple items [13], additive, i.i.d. Pareto distributions on item values [19], and unit-demand valuations with item values i.i.d. U(c, c+ 1), c > 0 [23]. 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA. important questions remain unsolved. For instance, while there is a characterization of optimal mechanisms as virtual-value maximizers [4, 7], relatively little is known about the exact structure of these mechanisms. Similar progress has not been made on the design of optimal, dominantstrategy incentive compatible (DSIC) mechanisms (rather, there has been emphasis to the design of approximate, DSIC mechanisms; e.g. Hart and Nisan [19]). The disruptive developments in machine learning suggest an opportunity to use machine learning for the design of optimal economic mechanisms. The use of machine learning for mechanism design was earlier pioneered by Dütting et al. [11], who use support vector machines to design payment rules for a given allocation rule (which can be designed to be scalable). But their framework need not provide incentive compatibility when the rule is not implementable and does not support design objectives stated on payments.2 We have initiatied our research into the use of deep learning for optimal design on the problem of multi-item, optimal auction design [10]. Subsequently, we have also investigated problems with private budgets [12], as well as problems of mechanism design without money [16] (with N. Golowich). We give here only a brief overview of the methodology and results from Dütting et al. [10]. For type profile v = (v1, . . . , vn) (for N = {1, . . . , n} agents), parametrized allocation rule g and payment rule p (mapping reported types to an allocation and payments, respectively), with weights w, and with loss function L(v; g, p) = − ∑ i∈N p w i (v), the machine learning problem of interest for optimal auction design can be stated as: min w Ev∼FV [L(v; g, p)] (1) s.t. [IC] rgt i(w) = 0, ∀i ∈ N [IR] irpi(w) = 0. ∀i ∈ N The type profile is sampled from n agents for some value distribution FV . The expected ex post regret for agent i, given parameters w, is rgt i(w) = Ev∼FV [ max v′ i∈Vi ui(v ′ i, v−i; vi, g , p)− ui(vi, v−i; vi, g, p) ] , (2) where Vi is the set of possible valuations for agent i, and ui(v i, v−i; vi, g , p) is the utility (value minus price) to agent i with valuation vi when reporting v′ i, when others report v−i, and with allocation and payment rule g, p, respectively. Zero expected ex post regret corresponds to a mechanism that is, except with measure zero, dominant-strategy incentive compatibile (or strategy-proof). The expected violation of individual rationality for agent i, given parameters w, is irpi(w) = Ev∼FV [ max{0,−ui(v; vi, g, p)} ] . (3) Zero expected violation of individual rationality corresponds to a mechanism that ensures, except with measure zero, that the utility from participation is non-negative. We use multi-layer, feed-forward neural networks to represent the parametrized economic mechanism. These networks provide differentiable, non-linear function approximations, where the training problem is optimized through stochastic gradient descent together with augmented Lagrangian optimization. Our fully agnostic approach proceeds without the use of characterization results and, because of this, holds the most promise in discovering new economic designs. The input layer of the REGRETNET architecture represents bids, and the network has two logically distinct components: the allocation network and payment network (see Figure 1). Each network is a fully-connected, feed-forward network with multiple hidden layers (denoted h and c) and an output layer. In our experiments these networks make use of two hidden layers, each with 100 units. Each hidden unit has a sigmoidal activation function applied to a weighted sum of outputs from the previous layer. These weights form the parameters of the network.3 Procaccia et al. [24] studied the learnability of voting rules, but without considering incentives. For a given bid profile b, illustrated here as providing a number for each agent for each of m items, the allocation network outputs a vector of allocation probabilities z1j(b), . . . , znj(b), for each item j ∈ [m], through a softmax activation function, with ∑n i=1 zij(b) ≤ 1 for each item j ∈ [m]. Bundling of items is possible because the value on output units corresponding to allocating each of two different items to the same agent can be correlated. In another variation, we handle unit-demand valuations by using an additional set of softmax activation functions, one per agent, and taking the minimum of these item-wise and agent-wise softmax components in defining the output layer. The output layer of the payment network defines the payment for each agent for a given type profile, and makes use of ReLU activation units (relu(s) = max{s, 0}).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

P-V-L Deep: A Big Data Analytics Solution for Now-casting in Monetary Policy

The development of new technologies has confronted the entire domain of science and industry with issues of big data's scalability as well as its integration with the purpose of forecasting analytics in its life cycle. In predictive analytics, the forecast of near-future and recent past - or in other words, the now-casting - is the continuous study of real-time events and constantly updated whe...

متن کامل

A Grouping Hotel Recommender System Based on Deep Learning and Sentiment Analysis

Recommender systems are important tools for users to identify their preferred items and for businesses to improve their products and services. In recent years, the use of online services for selection and reservation of hotels have witnessed a booming growth. Customer’ reviews have replaced the word of mouth marketing, but searching hotels based on user priorities is more time-consuming. This s...

متن کامل

Optimal Auctions through Deep Learning

Designing an auction that maximizes expected revenue is an intricate task. Despite major efforts, only the single-item case is fully understood (Myerson, 1981). In this work, we initiate the exploration of the use of tools from deep learning for the automated design of optimal auctions. The design objective is revenue optimal, dominant-strategy incentive compatible auctions. We show that multi-...

متن کامل

Optimal mathematical operation of a hybrid microgrid in islanded mode for improving energy efficiency using deep learning and demand side management

Deep learning method is used to predict the future value of load demand. Based on obtained results, a new model based on the forward-backward load shifting and unnecessary load shedding is presented. As well, to increase energy efficiency, excess renewable energy has been used to produce green hydrogen. For this purpose, GAMS optimization software has been used for optimal operation of the micr...

متن کامل

Real-time optimal control via Deep Neural Networks: study on landing problems

Recent research on deep learning, a set of machine learning techniques able to learn deep architectures, has shown how robotic perception and action greatly benefits from these techniques. In terms of spacecraft navigation and control system, this suggests that deep architectures may be considered now to drive all or part of the onboard decision making system. In this paper this claim is invest...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017